interpretation and optimization
Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization
Overfitting is one of the most critical challenges in deep neural networks, and there are various types of regularization methods to improve generalization performance. Injecting noises to hidden units during training, e.g., dropout, is known as a successful regularizer, but it is still not clear enough why such training techniques work well in practice and how we can maximize their benefit in the presence of two conflicting objectives---optimizing to true data distribution and preventing overfitting by regularization. This paper addresses the above issues by 1) interpreting that the conventional training methods with regularization by noise injection optimize the lower bound of the true objective and 2) proposing a technique to achieve a tighter lower bound using multiple noise samples per training example in a stochastic gradient descent iteration. We demonstrate the effectiveness of our idea in several computer vision applications.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.61)
Reviews: Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization
This paper introduces a method for regularizing deep neural network by noise. The core of the approach is to draw connections between applying a random perturbation to layer activations and the optimization of a lower-bound objective function. Experiments for four visual tasks are carried out, and show a slight improvement of the proposed method compared to dropout. On the positive side: - The problem of regularization for training deep neural network is a crucial issue, which has a huge potential practical and theoretical impact. On the negative side: - The aforementioned connection between regularization by noise and training objective lower bounding seems to be a straightforward adaptation of [9] in the case of deep neural networks. For the most important result given in Eq (6), i.e. the fact that using several noise sampling operations gives a tighter bound on the objective function than using a single random sampling (as done in dropout), the authors refer to the derivation in [9].
Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization
Noh, Hyeonwoo, You, Tackgeun, Mun, Jonghwan, Han, Bohyung
Overfitting is one of the most critical challenges in deep neural networks, and there are various types of regularization methods to improve generalization performance. Injecting noises to hidden units during training, e.g., dropout, is known as a successful regularizer, but it is still not clear enough why such training techniques work well in practice and how we can maximize their benefit in the presence of two conflicting objectives---optimizing to true data distribution and preventing overfitting by regularization. This paper addresses the above issues by 1) interpreting that the conventional training methods with regularization by noise injection optimize the lower bound of the true objective and 2) proposing a technique to achieve a tighter lower bound using multiple noise samples per training example in a stochastic gradient descent iteration. We demonstrate the effectiveness of our idea in several computer vision applications. Papers published at the Neural Information Processing Systems Conference.